52 research outputs found

    Multiple Contributions to Interactive Transcription and Translation of Old Text Documents

    Full text link
    There are huge historical document collections residing in libraries, museums and archives that are currently being digitized for preservation purposes and to make them available worldwide through large, on-line digital libraries. The main objective, however, is not to simply provide access to raw images of digitized documents, but to annotate them with their real informative content and, in particular, with text transcriptions and, if convenient, text translations too. This work aims at contributing to the development of advanced techniques and interfaces for the analysis, transcription and translation of images of old archive documents, following an interactive-predictive approach.Serrano MartĂ­nez-Santos, N. (2009). Multiple Contributions to Interactive Transcription and Translation of Old Text Documents. http://hdl.handle.net/10251/11272Archivo delegad

    Interactive Transcription of Old Text Documents

    Full text link
    Nowadays, there are huge collections of handwritten text documents in libraries all over the world. The high demand for these resources has led to the creation of digital libraries in order to facilitate the preservation and provide electronic access to these documents. However text transcription of these documents im- ages are not always available to allow users to quickly search information, or computers to process the information, search patterns or draw out statistics. The problem is that manual transcription of these documents is an expensive task from both economical and time viewpoints. This thesis presents a novel ap- proach for e cient Computer Assisted Transcription (CAT) of handwritten text documents using state-of-the-art Handwriting Text Recognition (HTR) systems. The objective of CAT approaches is to e ciently complete a transcription task through human-machine collaboration, as the e ort required to generate a manual transcription is high, and automatically generated transcriptions from state-of-the-art systems still do not reach the accuracy required. This thesis is centered on a special application of CAT, that is, the transcription of old text document when the quantity of user e ort available is limited, and thus, the entire document cannot be revised. In this approach, the objective is to generate the best possible transcription by means of the user e ort available. This thesis provides a comprehensive view of the CAT process from feature extraction to user interaction. First, a statistical approach to generalise interactive transcription is pro- posed. As its direct application is unfeasible, some assumptions are made to apply it to two di erent tasks. First, on the interactive transcription of hand- written text documents, and next, on the interactive detection of the document layout. Next, the digitisation and annotation process of two real old text documents is described. This process was carried out because of the scarcity of similar resources and the need of annotated data to thoroughly test all the developed tools and techniques in this thesis. These two documents were carefully selected to represent the general di culties that are encountered when dealing with HTR. Baseline results are presented on these two documents to settle down a benchmark with a standard HTR system. Finally, these annotated documents were made freely available to the community. It must be noted that, all the techniques and methods developed in this thesis have been assessed on these two real old text documents. Then, a CAT approach for HTR when user e ort is limited is studied and extensively tested. The ultimate goal of applying CAT is achieved by putting together three processes. Given a recognised transcription from an HTR system. The rst process consists in locating (possibly) incorrect words and employs the user e ort available to supervise them (if necessary). As most words are not expected to be supervised due to the limited user e ort available, only a few are selected to be revised. The system presents to the user a small subset of these words according to an estimation of their correctness, or to be more precise, according to their con dence level. Next, the second process starts once these low con dence words have been supervised. This process updates the recogni- tion of the document taking user corrections into consideration, which improves the quality of those words that were not revised by the user. Finally, the last process adapts the system from the partially revised (and possibly not perfect) transcription obtained so far. In this adaptation, the system intelligently selects the correct words of the transcription. As results, the adapted system will bet- ter recognise future transcriptions. Transcription experiments using this CAT approach show that this approach is mostly e ective when user e ort is low. The last contribution of this thesis is a method for balancing the nal tran- scription quality and the supervision e ort applied using our previously de- scribed CAT approach. In other words, this method allows the user to control the amount of errors in the transcriptions obtained from a CAT approach. The motivation of this method is to let users decide on the nal quality of the desired documents, as partially erroneous transcriptions can be su cient to convey the meaning, and the user e ort required to transcribe them might be signi cantly lower when compared to obtaining a totally manual transcription. Consequently, the system estimates the minimum user e ort required to reach the amount of error de ned by the user. Error estimation is performed by computing sepa- rately the error produced by each recognised word, and thus, asking the user to only revise the ones in which most errors occur. Additionally, an interactive prototype is presented, which integrates most of the interactive techniques presented in this thesis. This prototype has been developed to be used by palaeographic expert, who do not have any background in HTR technologies. After a slight ne tuning by a HTR expert, the prototype lets the transcribers to manually annotate the document or employ the CAT ap- proach presented. All automatic operations, such as recognition, are performed in background, detaching the transcriber from the details of the system. The prototype was assessed by an expert transcriber and showed to be adequate and e cient for its purpose. The prototype is freely available under a GNU Public Licence (GPL).Serrano MartĂ­nez-Santos, N. (2014). Interactive Transcription of Old Text Documents [Tesis doctoral no publicada]. Universitat PolitĂšcnica de ValĂšncia. https://doi.org/10.4995/Thesis/10251/37979TESI

    Supporting language diversity of European MOOCs with the EMMA platform

    Get PDF
    This paper introduces the cross-language support of the EMMA MOOC platform. Based on a discussion of language diversity in Europe we introduce the development and evaluation of automated translation of texts and subtitling of videos from Dutch into English. The development of an Automatic Speech Recognition (ASR) system and a Statistical Machine Translation (SMT) system is described. The resources employed and evaluation approach is introduced. Initial evaluation results are presented. Finally, we provide an outlook into future research and development.This work is partially funded by the EU under the Competitiveness and Innovation Framework Program 2007- 2017 (CIP) in the European Multiple MOOC Aggregator (EMMA) project. Grant Agreement no. 621030

    TransLectures

    Full text link
    transLectures (Transcription and Translation of Video Lectures) is an EU STREP project in which advanced automatic speech recognition and machine translation techniques are being tested on large video lecture repositories. The project began in November 2011 and will run for three years. This paper will outline the project¿s main motivation and objectives, and give a brief description of the two main repositories being considered: VideoLectures.NET and poliMedia. The first results obtained by the UPV group for the poliMedia repository will also be provided.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 287755. Funding was also provided by the Spanish Government (iTrans2 project, TIN2009-14511; FPI scholarship BES-2010-033005; FPU scholarship AP2010-4349)Silvestre Cerdà, JA.; Del Agua Teba, MA.; Garcés Díaz-Munío, GV.; Gascó Mora, G.; Giménez Pastor, A.; Martínez-Villaronga, AA.; Pérez Gonzålez De Martos, AM.... (2012). TransLectures. IberSPEECH 2012. 345-351. http://hdl.handle.net/10251/3729034535

    The evolution of the ventilatory ratio is a prognostic factor in mechanically ventilated COVID-19 ARDS patients

    Get PDF
    Background: Mortality due to COVID-19 is high, especially in patients requiring mechanical ventilation. The purpose of the study is to investigate associations between mortality and variables measured during the first three days of mechanical ventilation in patients with COVID-19 intubated at ICU admission. Methods: Multicenter, observational, cohort study includes consecutive patients with COVID-19 admitted to 44 Spanish ICUs between February 25 and July 31, 2020, who required intubation at ICU admission and mechanical ventilation for more than three days. We collected demographic and clinical data prior to admission; information about clinical evolution at days 1 and 3 of mechanical ventilation; and outcomes. Results: Of the 2,095 patients with COVID-19 admitted to the ICU, 1,118 (53.3%) were intubated at day 1 and remained under mechanical ventilation at day three. From days 1 to 3, PaO2/FiO2 increased from 115.6 [80.0-171.2] to 180.0 [135.4-227.9] mmHg and the ventilatory ratio from 1.73 [1.33-2.25] to 1.96 [1.61-2.40]. In-hospital mortality was 38.7%. A higher increase between ICU admission and day 3 in the ventilatory ratio (OR 1.04 [CI 1.01-1.07], p = 0.030) and creatinine levels (OR 1.05 [CI 1.01-1.09], p = 0.005) and a lower increase in platelet counts (OR 0.96 [CI 0.93-1.00], p = 0.037) were independently associated with a higher risk of death. No association between mortality and the PaO2/FiO2 variation was observed (OR 0.99 [CI 0.95 to 1.02], p = 0.47). Conclusions: Higher ventilatory ratio and its increase at day 3 is associated with mortality in patients with COVID-19 receiving mechanical ventilation at ICU admission. No association was found in the PaO2/FiO2 variation

    Treatment with tocilizumab or corticosteroids for COVID-19 patients with hyperinflammatory state: a multicentre cohort study (SAM-COVID-19)

    Get PDF
    Objectives: The objective of this study was to estimate the association between tocilizumab or corticosteroids and the risk of intubation or death in patients with coronavirus disease 19 (COVID-19) with a hyperinflammatory state according to clinical and laboratory parameters. Methods: A cohort study was performed in 60 Spanish hospitals including 778 patients with COVID-19 and clinical and laboratory data indicative of a hyperinflammatory state. Treatment was mainly with tocilizumab, an intermediate-high dose of corticosteroids (IHDC), a pulse dose of corticosteroids (PDC), combination therapy, or no treatment. Primary outcome was intubation or death; follow-up was 21 days. Propensity score-adjusted estimations using Cox regression (logistic regression if needed) were calculated. Propensity scores were used as confounders, matching variables and for the inverse probability of treatment weights (IPTWs). Results: In all, 88, 117, 78 and 151 patients treated with tocilizumab, IHDC, PDC, and combination therapy, respectively, were compared with 344 untreated patients. The primary endpoint occurred in 10 (11.4%), 27 (23.1%), 12 (15.4%), 40 (25.6%) and 69 (21.1%), respectively. The IPTW-based hazard ratios (odds ratio for combination therapy) for the primary endpoint were 0.32 (95%CI 0.22-0.47; p < 0.001) for tocilizumab, 0.82 (0.71-1.30; p 0.82) for IHDC, 0.61 (0.43-0.86; p 0.006) for PDC, and 1.17 (0.86-1.58; p 0.30) for combination therapy. Other applications of the propensity score provided similar results, but were not significant for PDC. Tocilizumab was also associated with lower hazard of death alone in IPTW analysis (0.07; 0.02-0.17; p < 0.001). Conclusions: Tocilizumab might be useful in COVID-19 patients with a hyperinflammatory state and should be prioritized for randomized trials in this situatio

    Post-Franco Theatre

    Get PDF
    In the multiple realms and layers that comprise the contemporary Spanish theatrical landscape, “crisis” would seem to be the word that most often lingers in the air, as though it were a common mantra, ready to roll off the tongue of so many theatre professionals with such enormous ease, and even enthusiasm, that one is prompted to wonder whether it might indeed be a miracle that the contemporary technological revolution – coupled with perpetual quandaries concerning public and private funding for the arts – had not by now brought an end to the evolution of the oldest of live arts, or, at the very least, an end to drama as we know it

    Outcomes from elective colorectal cancer surgery during the SARS-CoV-2 pandemic

    Get PDF
    This study aimed to describe the change in surgical practice and the impact of SARS-CoV-2 on mortality after surgical resection of colorectal cancer during the initial phases of the SARS-CoV-2 pandemic

    Language technology for handwritten text recognition

    Full text link
    This paper shows how the nowadays prevalent technology used in HTR borrows concepts and methods from the field of ASR; i.e. those based on Hidden Markov Models (HMMs). Additionally, it will be described a HTR approach based on employing Bernoulli distributions rather than Gaussian-Mixture distributions for the HMM-state emission probability of observations. Finally, handwritten text recognition evaluation results are reported for several corpora involving different characteristics and languages.Work supported by the EC (FEDER), the Spanish MEC under the MIPRCV “Consolider Ingenio 2010” research programme (CSD2007- 00018) and the Spanisg Government (MICINN and “Plan E”) under the MITTRAL (TIN2009-14633-C03-01) research project.Toselli, AH.; Serrano MartĂ­nez-Santos, N.; GimĂ©nez Pastor, A.; Khoury, I.; Juan CĂ­scar, A.; Vidal Ruiz, E. (2012). Language technology for handwritten text recognition. En Advances in Speech and Language Technologies for Iberian Languages. Springer Verlag (Germany). 328:178-186. https://doi.org/10.1007/978-3-642-35292-8_19S178186328Likforman-Sulem, L., Zahour, A., Taconet, B.: Text line segmentation of historical documents: a survey. International Journal on Document Analysis and Recognition 9, 123–138 (2007)Wong, K.Y., Wahl, F.M.: Document analysis system. IBM Journal of Research and Development 26, 647–656 (1982)Jelinek, F.: Statistical methods for speech recognition. MIT Press (1998)Katz, S.M.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. In: Proceedings of the IEEE Transactions on Acoustics, Speech and Signal Processing (ICASSP 1987), vol. ASSP-35, pp. 400–401(March 1987)Kneser, R., Ney, H.: Improved backing-off for n-gram language modeling. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 181–184 (1995)Bazzi, I., Schwartz, R., Makhoul, J.: An omnifont open-vocabulary OCR system for English and Arabic. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(6), 495–504 (1999)Toselli, A.H., Juan, A., Keysers, D., GonzĂĄlez, J., Salvador, I., Ney, H., Vidal, E., Casacuberta, F.: Integrated handwriting recognition and interpretation using finite-state models. International Journal of Pattern Recognition and Artificial Intelligence 18(4), 519–539 (2004)Rabiner, L., Juang, B.H.: Fundamentals of speech recognition. Prentice-Hall, Englewood Cliffs (1993)GimĂ©nez, A., Juan, A.: Embedded bernoulli mixture hmms for handwritten word recognition. In: Proceedings of the 10th International Conference on Document Analysis and Recognition, Barcelona, Spain, pp. 896–900. IEEE Computer Society (July 2009)Toselli, A., Juan, A., Vidal, E.: Spontaneous handwriting recognition and classification. In: Proceedings of the International Conference on Pattern Recognition (ICPR 2004), Cambridge, United Kingdom, vol. 1, pp. 433–436 (August 2004)Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for off-line handwriting recognition. International Journal on Document Analysis and Recognition (IJDAR) 5(1), 39–46 (2002)Romero, V., Toselli, A.H., RodrĂ­guez, L., Vidal, E.: Computer Assisted Transcription for Ancient Text Images. In: Kamel, M.S., Campilho, A. (eds.) ICIAR 2007. LNCS, vol. 4633, pp. 1182–1193. Springer, Heidelberg (2007)PĂ©rez, D., TarazĂłn, L., Serrano, N., Castro, F.M., Ramos-Terrades, O., Juan, A.: The germana database. In: Proceedings of the 10th International Conference on Document Analysis and Recognition, Barcelona, Spain, pp. 301–305. IEEE Computer Society (July 2009)Serrano, N., Juan, A.: The rodrigo database. In: Proceedings of the The Seventh International Conference on Language Resources and Evaluation (LREC 2010), Malta, May 19-21 (2010)Pechwitz, M., Maddouri, S.S., Magn̈er, V., Ellouze, N., Amiri, H.: IFN/ENIT-database of handwritten Arabic words. In: Proc. of the Colloque International Francophone sur l’Ecrit et le Document (CIFED), Hammmamet, Tunisia (October 2002
    • 

    corecore